Statistical Literacy and Scientific Investigation 


An independent experimental investigation represents the perfect teaching opportunity 
to develop the statistical as well as scientific literacy of your students. It has been 
suggested in fact, that “statistics is, or should be, about scientific investigation and how 
to do it better’ (Higgins, 1999, cited in Bulmer, 2002). When students are given the 
opportunity to design and conduct their own experiments, they become enthused, 
inquisitive and creative. There is also extensive evidence that statistical concepts are 
difficult for many students to grasp when they are introduced without a meaningful 
context (see for example Garfield and Ben-Zvi, 2007; Fitzsimmons, Williams and Ma, 
2005). 


So how can you best take advantage of this opportunity (real data, meaningful context, 
engaging content) to genuinely improve the statistical knowledge and skills of your 
students? 


The Experimental Cycle 

Figure | outlines the steps involved in a scientific experiment and makes clear the 
cyclical nature of research, where the results from previous experiments inform the 
planning of new ones. 


Plan 

- define problem 

- identify variables 
- operationalise 


Analyse 
- interpret 
- inferential statistics 


Preliminary trials - infer / predict 


- refine method 
- refine apparatus 
- refine the problem 


Conduct experiment Process data 

- repeated trials - organise 

- careful processes - graph 

- accurate recording - descriptive statistics 
- replicated trials 


Fig 1: Steps involved in experimental design 


This is obviously a stylised representation of an experimental investigation as the parts 
are not as distinctly separate as implied by this figure. Nonetheless it can be a useful 
model for students who often see experiments as completely linear: 


Fig 2: How students often see the experimental process. 


Before you Begin 


Planning — both for the experiment and for the statistical analysis — is the crucial first 
step to a good experiment. Students often need help forming a research question 
(operationalising the issue in which they are interested). Judicious questioning of the 
student can help them formulate their experiment in a scientifically and statistically 
useful and valid manner. 


Exactly what is the question they want to answer? What are the variables involved? 
Why those variables? How will they measure their independent variable? How will 
they measure their dependent variable? How will they control for other factors 
influencing their results? What assumptions have been made about the variables? What 
tools are available for measurement? Are those tools appropriate? What specifically is 
their hypothesis? 


Planning the statistical analysis is equally important and also needs to be addressed 
before the experiment is conducted. Does the hypothesis predict a difference in one 
variable after different treatments? A correlation between two variables? An interaction 
between two (or more) variables? What statistics are appropriate to address each of 
these experimental questions? In other words how will the student know whether their 
hypothesis is supported? What do they need to measure to do that analysis? What 
inferences do they hope to draw from their results? Are those inferences valid? What 
assumptions underlie the statistical techniques? 


Conducting 

Of course, due care and diligence must be taken when conducting the experiment and 
recording results: this gives another chance to embed difficult statistical concepts in 
real situation. What and where are the limits to accuracy in the data collection? How 
can accuracy be improved? Encouraging students to think about the accuracy of the 
data they collect, reinforces concepts such as limits to accuracy and confidence 
intervals. These are concepts that students often find difficult. After all an ‘error’ in 
everyday usage means ‘doing something wrong’; many students find it difficult to 
accept error analysis as a vaild part of experimental work. 


A Statistical Eye 

An effective method to get your students examining their data in a scientific and 
statistically valid manner, is to ask them to look at their results. What do they see? Are 
there any patterns? What stands out? Are the data very uniform? These questions will 
usually mean that they need to graph their data. How will they do that? Because it is in 
a context that is meaningful and important to them, the chances are they will begin to 
engage with the statistical concepts and ideas on a deeper and more meaningful level. If 
they are to ‘look at’ their data, they need to organise their data; they need to think about 
the ways to organise it. Will they group their data? How? Why will they group it that 
way? How can they graph their data: dot plots, histograms, box and whisker plots, line 
graphs, pie charts? Which is most useful and relevant and why? 

Some students will need more assistance than others in answering (or even asking) 
these questions. Some students will simply draw a pie chart, or a dot graph, or whatever 
their ‘favourite’ or most familiar representation is, whether it is valid or not. However 
this can be a great learning opportunity; a chance to engage with the student about why 
they are doing something; a chance to help the student understand where and when to 
use different techniques. Often different graphical representations will highlight 


different aspects of the data and this can improve students’ ability to engage with 
complexity. (See for example http://exploringdata.cqu.edu.au) 


Statistical Measures 
Statistics turn large amounts of data into something from which conclusions can be 
drawn and inferences or predictions made. 


After students have examined their data in a general sense, they will be left with a 
number of questions: What does the difference between groups signify? How strong is 
the relationship? Is it a direct linear relationship? Why is the range of the data so broad 
(or narrow)? Why do lower values have more (or less) variability than higher values? 


Students now have a reason and a context for the descriptive statistics that they will 
find. Which is the best measure to use for central tendency? If the data has a more or 
less normal distribution then the mean is preferable. However if the data is significantly 
skewed, it is likely the median is a more informative and useful measure. With 
categorical data, the mode is the only measure of central tendency that is applicable. 
Once the students have examined their own data visually, these distinctions become 
clearer and more intelligible to them. 


Sometimes data will be bimodal. If students have simply calculated a mean, before they 
have ‘eye-balled’ their data, they will have missed this important characteristic, and 
their mean is likely to be largely meaningless. Bimodal data can point to different sub- 
categories (such as male and female). When the data is examined separately for each 
sub-category, two normal but shifted distributions can often be seen. Calculating 
separate means for each of these is both legitimate and meaningful. 


The question of outliers is another serious issue that students will often confront. Is the 
outlier a true value? Is it the result of experimental or recording mistake? When is it 
OK to disregard an outlier? Because the answer to this last question is at least partly 
subjective, students can think that it’s OK to disregard values that ‘don’t fit the 
hypothesis’. This is a great chance to engage with your students about the scientific 
method and scientific reasoning. 


Outliers can also help to demonstrate to students why the simple range of data is often 
not the best measures of spread. They can show why measures such as the interquartile 
range convey more information about the population without needing to make a large 
number of assumptions. 


Drawing conclusions 


Once students have organised, examined and computed some statistics from their data, 
they need to draw conclusions from their investigation. Evidence needs to be weighed 
and arguments made, contradictory evidence needs to be assessed and alternative 
explanations canvassed. These can be difficult steps for students, who often ‘expect 
there is a single correct answer. Yet this is not the case in statistical investigations’ 
(Woodward and Pfannkuch, 2007, p839). Empirical evidence is by its nature 
probablistic and hence inherently includes uncertainty. 


So what does it all mean? Do the results support the hypothesis? When does a 
difference between two or more numbers mean something? If the student is examining 
how different treatments have affected one variable how do they know if any 
differences are more likely to be a result of treatment rather than random chance? If the 
students are looking for a relationship between two variables, how can they examine 


the strength of the relationship? Does the relationship always hold, or only under some 
conditions? 


Statistical tests can tell students whether their hypothesis was supported; whether a 
difference for example is ‘significant’. Computer programs and graphic calculators will 
compute these values for them. However it is more important that students understand 
the statistical concepts underpinning the tests, than that they can compute ay” or t- 
value that has no meaning for them (even if it is the appropriate statistical test). With 
bivariate data it can be even more important to discuss the meaning with students. The 
computer (or calculator) will produce a line of best fit, whether or not it is suitable. 
Moving students away from just calculating numbers to engaging with meaning is one 
of the advantages of self directed investigations: the students are keen to make sense of 
their data themselves.. 


Many of the concepts that underlie hypothesis testing statistics are complex, but this 
does not mean that your students should not be exposed to them. It in fact offers a 
chance to extend their understanding by reinforcing concepts such as variability, 
randomness, the normal distribution and skewness. As well, the differnce between 
population and sample, and the importance of computing confidence intervals can be 
explained. (One table summarising statistical tests, their assumptions, confidence 
intervals and conditions that may be useful, is available at 


http://statweb.calpoly.edu/mcarlton/food/FormulaSheet.doc) 


Scientific and statistical literacy 


Statistical and scientific literacy are inevitably and thoroughly connected. To improve 
the scientific and investigative skills of your students automatically requires a 
concommitant improvement in their statistical skills. By embedding the learning in a 
context that is relevant and engaging for the student, both should be enhanced. 


Statistical literacy resources are available from the ABS Education Services web site. 
www.abs.gov.au/teachers and www.abs.gov.au/students 
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